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using a search algorithm ... MT1 Compiler TDL-Flies ... We have imple- mented several search 
algorithms, including a genetic algo- rithm. simulated annealing, pyramid search, window ... 
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ie percentage improvement that we obtained for the SPARC when optimizing for speed . 
eline measures were obtained using the batch VPO compiler, which iteratively ... include 
itional optimization phases that were not previously exploited by the genetic algorithm. .. 
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... for an overview see eg [1]). However, to obtain an energy-efficient system, optimizing 
In [1 2] we have published compiler optimization techniques with the aim of minimizing thi 
the GeLIR-code and is then compacted by reusing the genetic algorithm dnven code ... 
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... We have implemented several search algorithms, including a genetic algorithm, simulated 
annealing, pyramid search, window search and random search. ... Driver List of 
Transformations MT1 Compiler TDL-Files F77 ... 3 Benchmarks and Platforms ... 
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... Niclos G. Fournier.: Enhancement of an Evolutionary Optimizing Compiler. Ph.D Thesis ... Shuvra 
S. Bhattacharyya.: A Joint Power/Performance Optimization Algorithm for Multiprocessor ... et al.: 
Automatic parallel I/O performance optimization using Genetic Algorithms. Proceedings ... 
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... to optimization has been to have a human expert hand-optimize the application, a ... Genetic 
algorithm: Genetic Algorithms are modeled on natural evolution processes and manipulate 
individuals in a ... to the target architecture, we used the native Fortran// or g// compiler with full ... 
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... Suppose, for program D in Fig- ure 1 . the algorithm locatES from its most similar ... approach would 
be to initially allow an existing high level restructurerto optimise each new ...technique described 
in this paper was imple- mented in a Java restructunng compiler and evaluated on ... 
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... and thus has an essential impact on the optimization progress of the genetic algorithm. ... by /%, 
whereas the number of memory accesses did not change for these benchmarks. ... The growing 
use of DSPs in embedded systems necessitates optimizing compilers supporting special ... 
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... Table 1 : Benchmark characteristics #CSE benchmark #CSEs uses CPU[s] cm complex multiply 
4 8 19 ... The growing use of DSPs in embedded systems necessitates optimizing compilers which 
are ... In this paper we have presented a genetic algorithm dnven code generator which ... 
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... In Section 4. we propose our interactive algorithm for enabling options and in Section 5 we discuss 
our experi- mental framevirork. ... In our case, columns correspond to compiler options and each 
row IS a particular compiler setting that can be used to optimize a program. ... 
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... We also present preliminary performance results of a set of benchmarks and applications 
measured on ... propagation, partial redundancy elimination (PRE) and partial dead s 
(PDSE) • Target-specific optimizations ... Compiler Support of the Workqueuing Exe 



... At the same time, it is desirable to tune the performance for the t 
perform code emission to generate the machine instructions for the tanget architEcture in 
cooperation ... Version 1 .4.0. The threshold in the interpreter to initiate the JIT compiler w, 



Flexware: a retargetable embedded-sotfavare development environment 

... This flexible design has become the basis for a more recent MPEG4 codec platform 
that mobile applications use. ... Finally, C compiler developers may use the tool to 
fine-tune compiler optimizations for the target architecture. ... 
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... Finally, vie propose several optimizations to tune the lienchmark. ... Section 4 and section 5 present 

compile-time and run- time characteristics of NPCryptBench on Intel ... algorithms and describe 

rules that we follow when Implementing the benchmark on a proposed target platform. ... 



Code size reduciion by com pijer tuning 

... In previous work, we have used so-called the main effect of compiler options to tune the compiler ... 
unswitch-loops 49 old-unroll-loops 50 branch-target-load-optimize 51 branch-target-load- 
optimize2 ... No complex new transformations or other adaptation of the compiler are needed ... 
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... library called GASNet [2]. The GASNet library is optimized for a variety of target archi- tectures ... 

options: -override limits -03 -g -tpp2 2back-end compiler options: -fast -05 -tune host -intrinsics 

38 ... IRIX64 V6.5, the MlPSpro Compilers V7.4 and the Berkeley UPC compiler V2.0.14 ... 
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... Although the language provides shared-memory semantics, the target architecture may not. ... On 

a hardware shared memory platform, the transformation is relatively straightfor- ward since ... 3 was 

used along with the overrlde-llmlts option to prevent the compiler from autDmatically ... 
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... character- istlcs of the hardware and software, as well as to tune compiler heuristics and ... may 
simply want to obtain the best possi- ble performance on each target platform with the ... that were 
designed to provide automatic generation and execution of benchmark programs from ... 



Rapidiy selecting good compiier optimizations using performance counters 
J GsvMi-.os, G Fiirsiri, F Agaxov... - , , 2007. CGO'07. 2007 - ieeftxpiore.it-rfte.org 
... Using the model GIven'a new target benchmark, we first extract the perfor- mance coui 
features x by running the benchmark. This requires 3 runs of the benchmark. ... These 
benchmarks are used by PathScale to tune their compiler suite. ... 
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... These primitives provide a platform on which skeletons describing SIMD computations can be 
defined. ... of the specification to tal<e advantage of the particular characteristics of an archit 
without compromising ... J15] as the source language and using C as the target language ... 
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K'l Stepiiensor!. S .Amsrssinghe, M Martin... - ACfJl "S!GP!.AN 20"03 - portai.ac 
... The algorithm stops merging paths when it has consumed the target archi- tec 
resources. ... Trimaran is an integrated compiler and simulator for a pa- rameterized EPIC 
architecture. Table 3 details the specific architecture over which we evolved. ... 
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... iterative compilation works are limited to specific ar- chitectures, limited to spi 
or ... The Itanium proces- sor makes a good target architecture since explicitly parallel mai 
depend ... Electron is among the best compilers for the Itanium platform, thus providing a 
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... DEBUGGING SYSTEM WITH PORTABLE DEBUG ENVIRONMENT-INDEPENDENT CLIENT 
AND NON-PORTABLE PLATFORM- SPECIFIC SERVER 5 ... trans- lation process varies based 
on the compiler program itself, the processor architecture, the target runtime execution ... 
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... the target can be fed ... Parallellzatlon (compaction) is left for the back-end architecture compiler. ... 

In our experience, these items are common in an embedded sys- tem development methodology, 

where firmware is simulated on a desk-top platform before being used in the field. ... 

The Chinook hardware/software co-svnthesis system LS0Oiran).kfes>m.«i 
Pi-i Ci-iou,, RB Oi-tftoa... - Proceedings of the 8th .... 1995 - purtai.acm.nrg 

... Chinook does not compile code to the target processor(s). It assumes not only the ... heterogeneous 
as cost and modularity concerns drive designers to tailor processors to specific functions ... We 
modeled this architecture with three handlers, one for the processor re- quests, one for ... 
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... is (almost) fully automatic and requires (almost) no knowledge about the compiler or 
architecture. ... benchmarks when compiled with GCC 2.6.3 and ran on the SimpleScalar platform. ... 

This shows that tuning compiler settings for a specific application ci ' 
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... Page 6. Genetic Programming Applied to Compiler Heuristic Optimization 243 ... Trimaran's compiler, 
which is called IMPACT, performs code ptufiling. Table 3 details the spei 
which we evolved. This model is similar to Intel's Itanium architecture. .. 
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... revision, but also at new implementations of the target Instruction Set Architecture, a new ... a learning 



Automatic selection of com piler options using non-parametric inferential statistics 

M ife-eda, PMW Knijrienburg... - .. , 2005. PACT 2005 2005 - ieeexpkire.ieeft.org 

... that the best opti- mization sequence depends on both the application as well as the target 
architecture. ... to set back-end com- piler switches for any application and architecture automat- 
ically. ... As is well known, each application requires its own specific setting of these options to ... 
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... particularly important codes, the user may want a version that limits its training set to that ... versus 
compi- lation sequences: restricting the set of optimizations to a smaller set that has ... 
computers— often have myriad flags that let a benchmarking specialist hand-tune the compiler's ... 

Predicting, unroil factors using supervised classification , 

... The^ask of a classifier is to learn how best to map 
(yi) using all the examples in the training set. While supervised lea 
learned classifier can easily be incorporated into a compiler. 4.2. ... 
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... values for which enabling the transforma- tion 1 1 leads to improved performance in the training 
set and also ... Note that gathenng training data and construction of the model is an offline process, 
that IS. it would ... These benchmarks are used by PathScale to tune their compiler suite. ... 
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... more, by evolving a compilers heuristic over several bench- marks, we can create effective, 
general-purpose heuristics. The best general-purpose heuristic our system found for hy- perblock 
formation improved performance by an average of 25% on our training set. and 9% on ... 



MILEPOSTGCC: machine learning based research compiler lEOafeniionaJ: 

... Drivers for iterative compilation and model training ... In an additional set of enhancements, a 
coherent event and data passing mechanism enables external plugins to discover the state of 
the compiler and ... ML dnvers to optimize programs and tune compiler optimization heuristic ... 
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... our system found improves the predication al- gorithm by an average of 25% on our training set, 
and 9% on a com- pletely unrelated test set. ... Compiler writers tediously fine-tune priority functions 

to achieve suitable performance [2]. Priority functions are widely used and tied to ... 
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... impact of compiler optimizations on any new program. As a result, we can drastically reduce the 
overall simulation time necessary to evaluate tentative architectures and tune programs to ... At first. 
It may be surpnsing that such a small training set size is sufficient to capture such a ... 
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... This approach is independent of search algorithm, search space or compiler infrastructure and 
scales gracefully with the compiler optimization space size. Off-line, a training set of programs 
IS iteratively evaluated and the shape of the spaces and program features are modelled. ... 
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... across a va- riety of applications leave users with little ability to tune performance-critical ... target 
for machine learning because the solution space is large and the compiler must make its ... actually 
very good general solu- tions: the heunstics learned on a training set of benchmarks ... 
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... These limitations reduce the complexity and thus execution time of the individual programs. ... the 
training set, but that presumably had something in common with the examples in the training 
set. Ihe machine code functions, the individuals in the population take a 32 bit integer as ... 
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... In this way, provides feedback to the search process and closes the learning loop. ... TTie resulting 
wsctors of features , computed for all images from the training set , are the basis for estimating 
the utility offer recognizing the objects from the training data . ... 



Hybrid engine for polymorphic shellcode detection ffJQlCQilllbM-JlQIIO-^^^^^ 

U Payer, P Teutl... - Intrusion and Malware Detection arid 2006 - Springer 
... For the training process the Le\«nberg-Marquardt [10] back-propagation method was used. ... aad, 
aam, aas, daa, das 12 jmp 27 cic, cid, cli, cits, clflush 13 inc, dec 28 cbw, cwd, cdq, cdwe 14 loop, 
loope, loopne ... Further instructions from the XB6 set were then added to the groups. ... 



Data mining static code attributes to learn defect predictors [,?!.l?.O.t"o;!!)l.S.s.y,8.d.y 

T Menaies!, J Grsenwald... - IEEE Transactions on Softwa!is 2007 - compLiter.ot^ 
... More formally,. $$P(H|E)={{P(H)}\over{P(E)}}\prod_iP(E_i|H); $$ ie, given fragments of evidence 
$E_i$ and a ... or "defect-free") is calculated, given the attributes extracted from a module such as 
the lines of code, the McCabe ... A learner is then applied to a training set built from nin. 



Tracking down software bugs using automatic anomaly detection [EMl(0iiipsu,8du 

S Hangai... - 2002 - computer.os^"} 

... 3: Sample code from multiprocessor simulator so because DIDUCE did not support disabling 
these checks as easily at that time, and the overhead was not a significant limitation. We set up 
DIDUCE to use the initial part of each simulation run for training, and ignored the ... 
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... solver, which has many tight loops with relatively small Map Reduce computations in each loop. ... 
Working set selection using second order intbrmation 1br training support \«ctor machines. J. 
Mach. ... Fast training of support \«ctor machines using sequential minimal optimization. ... 
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CG HunSer.. - ProSeins: Strsicti^re, F^incrion, .. .. 2003 intfrrscience.wiifty.cof?! 

... A \ariation on this process is to loop through the fragments se\eral times, each time increasing 

the ... PDB code Protein Fold class Resolution (A" ) Size (n) Mean cRMSD (A) Max cRMSD (A) ... 

Therefore, the training set size is a limiting factor when building high- resolution basis sets ... 

Cuvi! ^\ - otiiv it ; - Bi. Dirv-i-i - An -4 version*. 
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... training set), 216-00-2; 27 (polycyclic training set), 195-00-6; 28 (polycyclic training set), 
222-78 ... op- timized program a significant part of the search time is spent outside our \«ctorized 
SCREENS loop. ... The user interfacing to QUEST to set up the queries (on the VAXIront-end) is ... 
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... Only a few contributions [1,2,3,12,14,11,8,9] attempt to close the feedback loop of the learning 
process at the highest (eg, recognition) le\«l, and test the proposed approach in a real-world 
setting. ... Training set Testing set Average fitness over all runs 0.9770±0.0133 ... 

A meta-heuristic approach to paraiiel code cjeneraLion .IP.PfJ.JrQBIP.su.edu 
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... perceptron model to recommend a particular partitioning, selected from a restricted set, to apply .. 
Training tne neural network requires a representative selection of loops, each of wtiich must ... 
characteristics to the data partitioning which gi\«s maximum speed up in loop execution. ... 
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... All loops In the test suite cc 
results for a compiler attempt 



... C and SUIF. and IVIichael Wolf for building the initial system as wen as tne loop transformation 
library. ... We also want to thank John Ruttenberg for letting us use the Muttiflow test suite. The SUIF 
compiler project has been supported in part by DARPA contracts N00014-87-K-0828 ... 
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... 5. Loop Scoring Vendors were mailed a magnetic tape containing the Parallel Loops collection. ... 
Thus, the use of compiler directives or interactive compilation features to gain additional 
parallelizations was ... The objective of this test suite has been to provide a measure of system ... 
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Iteration of a loop [1 1 ]. There is one important case in our application test suite where the'' 
recognition of wrap-around loop bounds is a necessary precursor to the solution of an ... 
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... In fact, thi s dual loop paradigm can be found in three commonly used benchmarl< suites, namely 
the Prototype Ad a Compiler Evaluation test suite [1]. the Performance Issues Worl<ing Group 
(PIWG) test suite [5] develope d by a working group of the Association for Computing ... 
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... with the language specification, actually passing a slightly higher number of tests in the Jacl<s 
test suite [jac07a] than ... Our Java compiler follows this implementation scheme [EH06 ... 4.3. 1 The 
enhanced for loop Consider extending Java 1.4 with the enhanced for loop of Java 5; for ... 
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... An in- tioductoiy example of Gaussian elimination is used, among other programs in our test 
suite, to explain the compilation techniques. ... Arrays a and row are partitioned by compiler direc- 
tives. ... An array operation in the FortiangOD program is sequen- tialized into a do loop. ... 
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... C and SUIF. and Michael Wolf for building the initial system as well as the loop transformation 
library. ... We also want to thank John Ruttenberg for letting us use the Multiflow test suite. The SUIF 
compiler project has been supported in part by DARPA contracts N00014-87-K-0828 ... 
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... 1 . Livermore Fortran Kernels (LFK) A well-known set of 24 Fortran loop kernels devel oped ... 2. 
Argonne National Laboratory's Test Suite (ATS) [1] A set of 100 loops in four categories ... 
Dependence Analysis: the ability of a compiler to perform global flow analysis and dependence ... 



... though outer loop has much more iterations insufficient work in parallel loop: loop overhead 
dominates ... analysis tools have different thresholds/sensitivities, it is important that the test suite 
IS parametnzed ... A compiler switch pmfunc directs the compiler to instrument user functions ... 
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... If the RDS does change radically, the program will still behave correctly, but prefetching will not ... 
we performed detailed cycle-by-cycle simulations of the entire Olden benchmark suite [17 ... The 
Olden bench- mark suite contains ten pointer-based applications written in C, which are ... 

Evaluating. iterative compilation iPOS-;! tfom psis.edis 

F O B( > t ! t lage t< Compsisis "'^'^j •'[.■ringer 
... best program version is shown for three of the six different platforms across the three benchmarks. ... 
Otherevise the current best version is retained and we see no change in execution time reduction ... 
tion time) from the SPEC benchmark suite in order to find a good optimisation and ... 
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... probability, but unlike the space explonng random search algorithm, probabilities can change 

over time ... 5.2 Benchmarks We have chosen the UTDSP [15. 19] benchmark suite to evaluate ... This 

set of benchmarks contains compute-intensive DSP kernels as well as applications ... 
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... preach which evolves and adapts to applications and archi- tectural change, without sacrificing 
performance. ... This means that for each benchmark, the system has previously seen and op- 
timised the other fifteen benchmarks which act as training examples. ... 
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... The use of a standard benchmark problem facilitates compansons of various modeling approaches. ... 
Modify Unit Test Package 2.9.1. Description This step involves the modification of the ... Subsequent 
Iterations of this step may be based upon feedback from testing, indicating that ... 
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... In this equation. ATo.e-an refers to the change in total run time. Tcompi- refers to the amount 
of time It ... of sam- pie profiling: It presents a detailed performance eval- uation of the overhead 
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